Unsupervised Text Annotation

نویسندگان

  • Tanya Braun
  • Felix Kuhr
  • Ralf Möller
چکیده

We introduce the unsupervised text annotation model UTA, which iteratively populates a document-specific database containing the related symbolic content description. The model identifies the most related documents using the text of documents and the symbolic content description. UTA extends the database of one document with data from related documents without ignoring the precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Term-Expansion for Unsupervised Image Annotation

Automatic image annotation (AIA) deals with the problem of automatically providing images with labels/keywords that describe their visual content. Unsupervised AIA methods are often preferred because they can annotate (virtually) any possible concept to images and do not require labeled data as their supervised counterparts. Unsupervised AIA methods use a reference collection of images with ass...

متن کامل

Image-Text Dataset Generation for Image Annotation and Retrieval

This paper presents a new dataset of images gathered from the Web with corresponding text obtained from the webpages near where the images appeared. Already extracted features are provided to ease the dataset usage for other researchers. An initial release of 250,000 images is targeted at automatic image annotation with unsupervised data. This dataset is the one being used for the ImageCLEF 201...

متن کامل

News Image Annotation on a Large Parallel Text-image Corpus

In this paper, we present a multimodal parallel text-image corpus, and propose an image annotation method that exploits the textual information associated with images. Our corpus contains news articles composed of a text, images and image captions, and is significantly larger than the other news corpora proposed in image annotation papers (27,041 articles and 42,568 captionned images). In our e...

متن کامل

Painless Labeling with Application to Text Mining

Labeled data is not readily available for many natural language domains, and it typically requires expensive human effort with considerable domain knowledge to produce a set of labeled data. In this paper, we propose a simple unsupervised system that helps us create a labeled resource for categorical data (e.g., a document set) using only fifteen minutes of human input. We utilize the labeled r...

متن کامل

On Automatic Annotation of Images with Latent Space Models

Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017